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Abstract We investigate the algebraic structure underlying the stochastic Taylor 
solution expansion for stochastic differential systems. Our motivation is to con- 
struct ejficient integrators. These are approximations that generate strong numer- 
ical integration schemes that are more accurate than the corresponding stochastic 
Taylor approximation, independent of the governing vector fields and to all or- 
ders. The sinhlog integrator introduced by Malham & Wiese (2009) is one exam- 
ple. Herein we: show that the natural context to study stochastic integrators and 
their properties is the convolution shuffle algebra of endomorphisms; establish a 
new whole class of efficient integrators; and then prove that, within this class, the 
sinhlog integrator generates the optimal efflcient stochastic integrator at all orders. 
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1 Introduction 

We consider the simulation of stochastic differential systems of arbitrary order 
N afi. We assume for yt G our system has the form 



Kurusch Ebrahimi-Fard 

Instituto de Ciencias Matematicas, Consejo Superior de Investigaciones Cientfficas, C/ Nicolas 
Cabrera, no. 13-15, 28049 Madrid, Spain 

Alexander Lundervold 

Department of Mathematical Sciences, Norwegian University of Science and Technology, N- 
7491 Trondheim, Norway 

Simon J. A. Malham ■ Anke Wiese 

Maxwell Institute for Mathematical Sciences, and School of Mathematical and Computer Sci- 
ences, Heriot-Watt University, Edinburgh EH14 4AS, UK 




Hans Munthe-Kaas 

Department of Mathematics, University of Bergen, Postbox 7800, N-5020 Bergen, Norway 



2 



Kurusch Ebrahimi-Fard et al. 



This system is driven by a d-dimensional Wiener process {W^, . . . ,W ) and gov- 
erned by a drift vector field Vb and diffusion vector fields Vi, . . . , V^. We use the 
convention Wj^ = t and interpret the stochastic integrals in the Stratonovich sense. 
Hereafter we will assume t £ K+ lies in the interval of existence of the solution. In 
general, we also suppose that the vector fields Vj : — )■ for i = 0, . . . , d are 
sufficiently smooth and non-commuting. We focus on solution series and their use 
in strong simulation schemes. The stochastic Taylor expansion for the fiowmap 
ft-yo'-^yt, taking the data yo at time t = Q to the solution yt at time t for the 
stochastic differential system above, is given by (see for example Baudoin 2004 or 
Lyons & Victoir 2004) 

ft = ^2 •^w{t) Vw 

w 

Here w = ai...an is a word with letters ai,...,an chosen from the alphabet 
A := {0,1,..., d}. The sum is over all possible words w in A*, the free monoid 
on A. All the stochastic information is encoded in the scalar random variables 
(Stratonovich integrals) 




The partial differential operators Vw '■= Vai o • • • o Va„ , constructed by composing 
the vector fields, encode all the geometric information. 

Strong numerical integration schemes for stochastic differential systems are 
based on truncating the stochastic Taylor expansion and applying the resulting 
approximate fiowmap over successive small computation subintervals spanning the 
global time interval of interest. More generally, across a computation interval [0, t], 
for any smooth map /: Diff(]R^) Diff(K^), we can: 

1. Construct the series at = f{(pt)', 

2. Truncate the series at to at according to a grading g{w) on the words w; 

3. Compute (pt = f~^(^t) and use this as the basis of a numerical scheme. 

For example, suppose / = id, the identity map. Then at is just the stochastic 
Taylor expansion cpt, which we split according to the grading g(w) as follows 

'fit = Jw Vw + Jw Vw , 

g{w)-^n g(iu)^n+l 

for n G N. A stochastic Taylor numerical scheme of strong order n/2 would be the 
first term on the right shown; the remainder is the last term. The grading g{w) 
here is determined by the variance of the stochastic integrals Jw; zero letters in 
w contribute a count of one while non-zero letters contribute a count of one-half 
towards g{w). An important technicality is to include in the integrator, the expec- 
tation of the terms in the remainder at leading order. This is because not including 
them would decrease the expected global order by one-half (an explanation can 
be found in Buckwar, Malham & Wiese 2012 or Malham & Wiese 2009). In other 
words, a stochastic Taylor integrator of strong order n/2 is 

0t= JwVw+ ^ E{Jw)Vw, 

g{w)^n g{w)=n+l 
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where the expectations of the Jw, here denoted E{Jw), are known analytically. The 
Euler-Maruyama and Milstein numerical methods correspond to the cases n = 1 
and n = 2, respectively, applied on successive computation subintervals with the 
vector fields evaluated on the initial data on each subinterval. Stochastic Runge- 
Kutta methods are constructed by replacing the partial differential operators Vw 
by finite differences. 

Another example is / = log so that ot = logifit- This is the exponential Lie 
series which is the basis of the Castell-Gaines method (see Castell & Gaines 1995, 
1996; also see Azencott 1982, Ben Arous 1989 and Castell 1993). Truncating the 
exponential Lie series to at generates a Lie polynomial in the Lie algebra of vector 
fields. We assume we can suitably approximately simulate the multiple integrals 
Jw retained in the truncation (more on this presently). Hence 0t = expat and our 
approximation yt to the solution yt across [0,i] can be generated as follows. For a 
given realization of the Jw terms retained, we simply solve the ordinary differential 
system u' = at o u ior u = u{t), for r € [0, 1] and m(0) = yo. Though we might 
achieve this analytically, more often one has to use a suitably accurate ordinary 
differential integrator. In either case we have u(l) ~ yt- 

We measure the accuracy of a strong order integrator by the root-mean-square 
of its local remainder 

n ■= <pt ~ <f>t- 

More precisely, we measure \\rt o j/oHl^ for each yo £ , i.e. the square-root of 
the expectation of the Euclidean norm of n o j/o- Let us now clarify an impor- 
tant issue. The order of a strong numerical method is determined by the set of 
multiple Wiener integrals simulated and included. Since the set of all multiple 
(Stratonovich) Wiener integrals is generated by those based on Lyndon words (see 
Reutenauer 1993, p. Ill and Gaines 1994), we need only simulate the multiple 
Wiener integrals indexed by Lyndon words. The other multiple Wiener integrals 
of that order can be computed by linear combinations of products of the appro- 
priate Lyndon word multiple integrals of that order or less. However Lyndon word 
multiple integrals of the same order cannot be generated as such (or from each 
other). Hence more correctly, the order of a method is determined by the set of 
Lyndon word multiple Wiener integrals included. Consequently, in general with 
this multiple integral set, a more accurate method can only have a better error 
constant and an improvement in order scaling is not possible. Throughout this 
article we assume we can suitably approximate/simulate the Lyndon word multi- 
ple Wiener integrals up to the order required. The bulk of computational effort 
in higher order strong simulation methods is devoted to this task, though there 
has been some recent advances on this front; see Wiktorsson (2001), Lyons and 
Victoir (2004), Levin & Wildon (2008) and Malham & Wiese (2011). 

Castell & Gaines (1995, 1996) proved that their simulation method of strong 
order one-half was asymptotically efficient in the sense of Newton (1991): it "mini- 
mizes the leading coefficient in the expansion of the mean-square errors as power 
series in the sample step-size". This property extends to their strong order one 
method when the diffusion vector fields commute. However, Malham & Wiese 
(2009) demonstrated that when the stochastic differential system above is driven 
by a multi-dimensional driving Wiener process and the governing diffusion vector 
fields do not commute, then a strong numerical simulation based on the expo- 
nential Lie series is not asymptotically efficient (independent of the vector fields) ; 



4 



Kurusch Ebrahimi— Fard et al. 



also see Lord, Malham & Wiese (2008). Malham & Wiese (2009) proved, in the 
absence of drift, that a strong simulation method generated by taking the sinhlog 
of the flowmap, truncating the resulting series and then taking the inverse sinhlog, 
is efficient to all orders. This means that the error of the sinhlog integrator is 
always smaller than the error of the corresponding stochastic Taylor integrator in 
the mean-square sense, independent of the vector fields. In this paper we: 

1. Show that the natural context to study stochastic integrators and their prop- 
erties is the convolution algebra of endomorpisms on the Hopf shuffle algebra 
of words; 

2. Establish a new class of efficient stochastic integrators using this algebraic 
structure (we include drift and grade according to word length); 

3. Prove that within this class, the sinhlog integrator generates the optimal ef- 
ficient stochastic integrator to all orders. By this we mean that the error of 
the integrator realizes its smallest possible value compared to the error of the 
corresponding stochastic Taylor integrator, in the mean-square sense. 

Our paper is structured as follows. In !j2] we demonstrate the direct relation 
between stochastic expansions and the convolution shuffle algebra of endomor- 
pisms on the Hopf shuffle algebra of words. We define an inner product struc- 
ture on the convolution shuffle algebra of endomorpisms that is based on the 
correlation measure between multi-dimensional stochastic processes in ^ We also 
demonstrate some natural useful identities and orthogonality properties of endo- 
morphisms therein. In 21 we prove results 2 and 3 stated above. Finally in ^we 
discuss the implications of our results and provide some concluding remarks. 



2 Stochastic expansions and the convolution shuffle algebra 

We introduce the convolution shuffle algebra and show it is the natural context 
to study stochastic expansions. For the moment, we proceed formally as the basic 
material can be found in the monograph by Reutenauer (1993); also see Remark[l] 
below. Dropping J's and y s, we can represent the stochastic Taylor series for the 
flowmap by 

if = w ^ w, 

w 

which lies in the product algebra K(A)gj^ ® 'K{K)^^ over the commutative ring 
K = M. We are interested in the following two Hopf algebra structures on ]K{A). 
One has shuffle m as product and deconcatenation A as coproduct (K(A)gj^ on the 
left). The other has concatenation as product and deshuffle as coproduct (K(A)^^ 
on the right). The unit, counit and antipode are the same for both algebras K(A)gj^ 
and ]K(A)|^Q. We denote the empty word 1 G ]K(A). The product of two terms in 
K{A)^j^®K(A)^„ is 

(it ® x){v ® y) = {uLuv) ® {xy), 

where we concatenate the words on the right representing the composition of vector 
fields. On the left, uluv represents the sum of all possible shuffles of the words u 
and V, representing the product of two multiple integrals. 
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Remark 1 The homomorphism from K(A)gjj(8)]K(A)^Q to the free associative algebra 
of vector fields is established as follows (see Malham & Wiese 2009, Section 2(c)). 
Let JJ denote the ring generated by multiple Stratonovich integrals and the constant 
random variable 1, with pointwise multiplication and addition. Also let V denote 
the set of all vector fields on R^. The flow map ip defined in the introduction 
lies in J{V) = 0„>q JJ ® Vn, where Vn is the subset of vector fields Vw with w 
of length 71. The linear word-to-vector field map k: K(A) — > V given by k: w i— > 
Vw is a concatenation homomorphism, i.e. k{uv) = k{u)k{v) for any u,v € A*. 
And the linear word-to-integral map fj,: M{A) — >■ JT given hy jj,: w i-^ Jw is a shuffle 
homomorphism, i.e. fi{u luv) = fi(u)fi{v) for any u,v £ A* (see Lyons, et. al, 2007, 
p. 35 or Reutenauer 1993, p. 56; this also underlies our choice to use Stratonovich 
integrals rather than Ito integrals which satisfy a quasi-shuffle relation). Hence the 
map /i (g) k: K.{K)^^ ® ^.{K)^^ — >• 0„>q JJ ® V„ is a Hopf algebra homomorphism 
which naturally extends to the free associative algebra of vector fields. 

Suppose we apply a polynomial or power series function to the flow-map 93, say 
/ = fW)- For example suppose / has a simple power series expansion 

00 

with coefficients {cn € K: n ^ 0}. Then the product in ]K(A)gjj ®M.{K)^^ implies 
that after rearrangement, we can always express the result in the form 

/H= {Fow)®w, 

where F € End(]K(A)g[j) is given by (|w| denotes the length of the word) 

H 

F O W = ^ ^ Cfc Ml LU . . . LU U/j. 

fe=o Mi,...,tifceA* 

W=Ui...Uk 

See Reutenauer (1993, p. 58) or Malham and Wiese (2009) for more details. This 
suggests we can encode the action of any such function / of the flowmap <p by an 
endomorphism F e End(]K(A>gjj). Indeed the embedding End(]K(A)gjj) IK(A)g^(8) 
^Wco given by 

X ^ ^X(w) ® w, 

w 

is an algebra homomorphism for the non-commutative convolution product defined 
for all X,Y e End(]K(A)3h) by 

X*Y = LU o{X^Y)oA. 

Since the coproduct A here is deconcatenation, which takes a word w and produces 
a sum of all possible two-partitions w (g) w of w, we see that explicitly 

{X-kY){w)= ^ X{u)inY{v). 

w=uv 
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Thus, we can encode and study the structure and properties of functions of the 
fiowmap through endomorphisms F G End(K(A)gjj), with convolution as product. 
For convenience, we henceforth denote 

H := End(K(A)3h), 

the convolution shuffle algebra of endomorphisms on K.{A)^^, which is a unital 
associative non-commutative K-algebra. The unit i/ in H is the composition of 
the unit and counit. Indeed i/ sends non-empty words to and the empty word 
to itself. For any Hopf algebra, by definition the antipode S is the inverse of the 
identity endomorphism with respect to the convolution. Thus we have 

S -kid = id* S = f. 

On K(A)gj^ the antipode S' G EI is given by S{ai . . .an) ■= (—!)"««... ai, i.e. it is 

the sign-reversing endomorphism. 

Remark 2 There is a dual convolution concatenation algebra which we could al- 
ternatively utilize; see Reutenauer (2009; Section 1.5). 

Remark 3 There is natural compatibility between convolution and composition in 
H. For an algebra homomorphism Z e H one verifies that Zo[X-kY) = {ZoX)-k(Zo 
Y). For a coalgebra homomorphism Z e H we have (X-kY)o Z = (X o Z)-k{Y o Z). 

As the shuffle product is commutative, one can show that if X, Y G H are algebra 
homomorphisms, then X -k Y is also an algebra homomorphism. The subset of 
algebra homomorphisms forms the group H CM. of IK(A)gjj-valued characters, with 
unit ly. The inverse of X e 'H is given is a subgroup of 

the Lie group 

g := {XeM: X(l) = l}. 

The corresponding Lie algebra f) of infinitesimal characters is a Lie subalgebra of 
the subalgebra 

g:= {XgH: X(1) = O}, 

of H. The inverse in Q is given by := Efe^o('^-^)*''- Note that we denote 

by X*'' the fe-factor convolution product X * X * ■ ■ • * X for any X G H. Observe 
that, for X € g, ii w has length less than k then X*'' o w returns 0. Hence the 
formal sum for X*^~^^ makes sense and indeed, note that (u — X) £ q for X £ Q. 
In fact we observe that Q = {u} © g. See Manchon (2008), Patras & Reutenauer 
2002 and Patras (1994) for more details. 

An important endomorphism is the augmented ideal projector given by 

J := id — v. 

Thus J sends non-empty words to themselves, but the empty word to 0, i.e. J € g. 
Note for example, J**^ takes a word w with \w\ ^ k and creates a sum of all possible 
fe-partitions of w shuffled together (with the empty word an excluded partition). 
On the other hand, id**^ also includes all partitions involving the empty word. Now 
observe that the endomorphism F corresponding to the function / of the fiowmap 
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above, with the power series coefficients {cfc}, is the endomorphism defined by the 
series in H: 

oo 

r(id) = ^cfeid*^ 

k=0 

More generally, we can consider functions on H. For example, for X € H, 
with X{1) = el and e G K, we can construct an endomorphism through a series 
expansion about a multiple e of the unit i' as follows: 

oo 
k=0 

where X — eu £ q. Of particular interest are the bijective logarithm log*: Q ^ Q 
and exponential exp* : G maps defined for any X € g hy 

iog*(i. +x) = Y^ -^^^ <^^p*(^) = H h^*'- 

k=l k=0 

The sinhlog and coshlog maps are defined for X e ^ by 

sinhlog*(X) = and coshlog* (X) = i (X + X*^-^'), 

also have series representations in powers of {X — v). These maps and their com- 
positional inverses underlie our main result. For all X,Y €M, set h* to be 

h*{X,Y) := (X*2+y)*(V2)^ 
for which the square root exists. Then we have the compositional inverses 
sinhlog"^ (X) = X + h*{X, and coshlog"^ (X) = X + h*{X, -v), 

i.e. we have sinhlog"^ o sinhlog* (X) = X and coshlog"^ o coshlog* (X) = X. 

To illustrate this new perspective and its natural connection to stochastic ex- 
pansions, consider three examples. First, we observe that the stochastic Taylor 
expansion for the flowmap is simply the identity id e H. Second, the sinhlog func- 
tion considered by Malham & Wiese (2009) is given by 

sinhlog* (id) = \{\d-S), 

since 5* = id*'-"^^ In other words, applying the sinhlog function to the flowmap 
in ]K.(A)gjj (g) K(A)j,Q, corresponds to applying sinhlog* to the identity id € %. 
Third, the Eulerian idempotent, which is a Lie idempotent from the free associative 
algebra to the free Lie algebra, is given by 

log* (id) = J - i J*2 + 1 J*3 - . . . + (^i)^ 7*^= + • • • . 

This is the exponential Lie series or Chen-Strichartz formula (see Burgunder 2009, 
Chen 1957, Magnus 1954, Strichartz 1987 and Baudoin 2004). To see this we use 
that J*^ can be expressed as a sum over permutations with a prescribed descent 
set; see Reutenauer (1993; p. 65). Note also that since the antipode is the inverse 
of the identity with respect to the convolution product, then 

S = - J + J*2 - J*3 + . . . . 



8 



Kurusch Ebrahimi-Fard et al. 



Hence we can also express the sinhlog endomorphism as 

sinhlog*(id) = J - i J*2 + i _ . . . ^ 1 + . . . . 

These three examples belong to the subalgebra of endomorphisms generated by 
the unit v and augmented ideal projector J. 

Remark 4 Note that the sinhlog and coshlog endomorphisms are projectors as 
i(id±S')o i(id±5') = KidiS) and i(id ± S) o i(id t S) = 0. 

We conclude this section by defining some endomorphisms and their properties 

useful in our subsequent analysis. 

Definition 1 (Reversing and sign endomorphisms) These endomorphisms in 
HI are defined for any word w = ai . . . an £ A* as follows: (1) Reversing endomor- 
phism: \S\: w ^ an ■ ■ ■ oi; and (2) Sign endomorphism: D: w ^ (— 

Note, for example, that _D o D = id and S = -D o l^l = \S\o D. Observe also that 
since S G U we have \S\{uujv) = \S\{u) m \S\{v). 

3 Convolution shuffle algebra with expectation inner product 

We have seen that classes of endomorphisms F e H correspond to functions of 
the flowmap. Our goal here is to define an appropriate inner product on H and 

analyze its properties. This will be modelled on the mean-square measure of an 
M^-valued stochastic process, constructed as follows (hereafter the real parameter 
t > is fixed). 

3.1 Expectation endomorpism 

Let D* C A* denote the free monoid of words on the alphabet D = {0, 11, . . . , dd}. 

Definition 2 (Expectation map and endomorphism) The expectation map is 
the linear map E: ]K(A)gjj — >• K which, in the case A indexes d independent Wiener 
processes, is given by 

- ft"(™)/(2'^("')n(w)!), weD*, 

E: < ^ ' 

[0, 'u;€A*\D*. 

Here d(w) is the number of non-zero consecutive pairs in w from the alphabet D 
and n(ui) = z(w) -|- d(ui), where z(ui) counts the number of '0' letters w contains. 
We define the expectation endomorphism £ £ H as 

E:w^ E{w) 1. 

Note the endomorphisms X — E o X = (id — E) o X va.M. have zero expectation. 
Indeed (id — E) lies in the kernel oiEssEoE = E. 

Remark 5 Strictly, the expectation map E: 'K{K)^^ — > IK[f], where t is a parameter 
commuting with all of K.(A)gjj, and (id — E) o X takes values in K[t] 1 ® IK.(A)gjj. 
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Remark 6 The values quoted for the expectation map for any w e A* are the 
expectations of the corresponding multiple Stratonovich integrals, see Kloeden 
& Platen (1999; eqns (5.2.34), (5.7.1)) or Buckwar et al. (2012). Briefly, every 
Stratonovich integral labelled by w £ A*, is a linear combination of Ito integrals. 
The Ito integrals concerned, cycle through the set of words which consist of w 
and all words u obtained by successively replacing any two adjacent non-zero 
equal indices in w by 0. Each replacement contributes a factor one-half to the 
coefficient of the Ito integral in the linear combination. Since the expectation of 
any Ito integral is zero unless all its labelling letters are 0, the expectation of the 
Stratonovich integral is given by the deterministic integral remaining after the 
replacement process (and zero if there isn't one). 



3.2 Inner product of endomorphisms 

Let {Vu,}^gA* denote a given set of indeterminate vectors indexed by words w € 
A* . We use both (u, v) and Vuv to denote the inner product of the vectors and 
Vv . Let V denote the infinite square matrix indexed by the words u,v G A* with 
entries V^,^. 

Definition 3 (Inner product) We define the inner product oi X,Y € H with 
respect to V to be 

{X,Y)m.= Y1 E{X{u)^Y{v)){u,v). 
The norm of an endomorphism X G H is \\X\\-a := {X, X)^^"^. 

Let us motivate this definition and provide some equivalent characterizations. Sup- 
pose we apply two separate functions to the flowmap ip which are characterized 
by the endomorphisms X and Y in H. We assume the governing vector fields and 
driving Wiener processes are given as well as some data yo € R^. With a slight 
abuse of notation we express the two stochastic processes xt and yt associated with 
X and Y as follows 

= ^ X{w)Vwiyo) and = ^ Y{w)Vw{yo)- 

Our definition is based on the L^-inner product {xt,yt)L^ = E{xt,yt)- We would 
like our inner product to be independent of the data yo, hence we replace the 
vector fields evaluated at the data by the set of indeterminants {\/w}weA*- 

An equivalent characterization of the inner product is as follows. The action 
of an endomorphism X on a word can be written X{w) = J2ueA* ^w,uU, for some 
K- valued coefficients Xw,u- In other words we can represent X G H by a matrix 
X G K°° X IK°° indexed by the words w G A*. We can order the indexing by 
word length and then lexicographically within each length (for example) . Now let 
W G IK°° X ]K°° denote the symmetric matrix of values E{uujv} over all m, w G A* 
and V G ]K°° x 1K°° the symmetric matrix defined above. Then using our definition 
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above, we observe that 

,V 

= tr (XWY^ V^') 

= tr (v^xwyt (V^)'''), 

where f denotes matrix transpose. Note that both W and V are positive definite. 
Hence we can view the definition of the inner product above as defined with respect 
to the weights W and V where the stochastic and geometric information are re- 
spectively encoded. All results we subsequently establish will hold independent of 
V. Lastly, we now note that our inner product is: (i) Symmetric as the shuffle prod- 
uct and vector inner product are commutative; (ii) Bilinear as the endomorphisms 
and expectation are linear; (iii) Positive definite as the matrix of expectations W 
is positive definite. 

Remark 7 We assume that the solution to our stochastic differential system for any 
data 2/0 is i^-valued on the time interval of interest. This is equivalent to saying 
that ||id||ji is finite, which is equivalent to assuming tr (WV) is finite. 

3.3 Graded class subspace 

We often will be concerned with endomorphisms that act on subspaces of ]K(A)gjj 
selected according to a grading. Recall that K{A)^^ is a connected Hopf algebra 
graded by the length of words. 

Definition 4 (Grading map) This is the linear map g: K(A)gj^ — >■ Z+ given by 

g : w I— >■ 

The empty word has length zero, i.e. |1| =0. 

Remark 8 There is another natural grading on A* given by the variance of the 
words w G A* , as mentioned in the introduction. This is determined by the expo- 
nent in t when computing the root-mean-square deviation from the expectation of 
w, i.e. the square-root of £" o (m — E{w)) Briefiy, for any word w, zero letters 
contribute a count of one while non-zero letters contribute a count of one-half 
towards the grade value. The nuances of the two gradings are revealed in ij5l 

Definition 5 (Graded class subspace) For a given n £ let denote the 
subspace of K(A)gjj of all words w of given length g(w) = n. We set 

S^«:=0Sfc and §^„:=0§fe. 

A subspace § is a graded class subspace if, for a given n £ '^+, S = Sn, S = S^,i or 
S = For any graded class subspace §, we denote by 

7rs:K(A),i^^§, 

the canonical projection from ]K{A)gj^ onto §. 
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Hereafter we set, for any graded class subspace §, for all X, V G H: 



{X,Y) := (Xo7rs,y oTTs) 



and 



||X|| := XoTTs 



We carefully state the subspace § in all instances, so no confusion should arise. 
3.4 Properties of the sinhlog and coshlog endomorphisms 

The following lemma from Malham & Wiese (2009) is a crucial ingredient in what 
follows. We restate it here and discuss an extension we rely upon for clarity. 

Lemma 1 (Malham &: Wiese (2009); Lemma 4.3) For any pair u,v € A*, we 

have E [uLuv) = E (^[\S\ o u) lu (15*1 ° v)) ■ 

Though the context for this lemma was the alphabet A = {l,...,d}, it in fact 
extends to A = {0, 1, ... , d}. The proof detailed in Malham & Wiese relies on two 
results. First, any Stratonovich integral can be expressed as a linear combination 
of Ito integrals, as described in Remark [6] (also see Kloeden & Platen 1999; eqn 
(5.2.34)). Importantly, reversing the word associated with a Stratonovich integral 
generates a mirror linear combination of Ito integrals with their respective words 
reversed (with the same coefficients). Second, the expectation of the product of any 
two Ito integrals only depends on the number of non-zero letters and the lengths 
of subwords containing only letters. These characteristic quantities are invariant 
to reversing the words concerned (see Kloeden & Platen 1999; Lemma 5.7.2). Both 
of these results from Kloeden & Platen are stated for A = {0, 1, . . . , d}. 

Remark 9 Importantly, note that if u and v have the same length, then we can 
replace IS*] by S as the result is then invariant to the sign in the antipode S. This 
observation underlies the restriction to § = §„ when A = {0, 1, . . . , d} in Lemma[2] 
below. 

We now state the main lemma of this section, outlining properties of the principle 
endmorphisms we have thusfar highlighted, in particular the endomorphisms 



The results herein are used to establish the optimal efficiency properties of the 
sinhlog integrator in [21 

Lemma 2 We assume either, A = {0,1,..., d} and S is the graded class subspace 
§ = §n, or, A = {1, . . . , d} and § is any graded class subspace. Then for any X,Y G II 
( and for any V) we have the following properties: 



1. {X,Y) = (IS*! oX, l^l o Y); 

2. \\S\, \S\) = {S,S) = (id,id); 

3. (sinhlog* (id), coshlog* (id)) = 0; 

4. \\idf = ||sinhlog*(id)f + II coshlog* (id) f; 

5. {X,EoY) = {EoX,EoY); 

6. (£;oid,£oid) = {Eo\S\,Eo\S\) = {EoS,EoS}; 

7. {E o sinhlog* (id), _E o coshlog* (id)) = 0; 

8. (|5|, J*") = (id, J*"), 



sinhlog* (id) 



i (id - 5") and coshlog* (id) = i (id + 5") . 
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where property 8 only holds for S = Sn, independent of the alphabet. Property 3 indi- 
cates sinhlog* (id) and coshlog*(id) are orthogonal with respect to the inner product. 

Proof We establish properties 1-8 in order. Using the linearity properties of the 
expectation map and reversing endomorphism |5| we observe that 

VW2 E (wi LU W2) 

Wl ,W2 

= ^ X„ujiYt,u,2 _E ((IS*! o TOi) LU o W2)) 

=E{{\S\oX){u)^{\S\oY){v)), 

and property 1 follows. Property 2 is a special case of property 1. Using property 2 
and Lemma [1] we deduce that 4{sinhlog*(id), coshlog*(id)) = (id, id) — {S,S) = 0, 
thus establishing property 3. Property 4 follows directly when we observe that 
sinhlog and coshlog additively decompose the identity, i.e. id = sinhlog* (id) + 
coshlog*(id). Properties 5-7 now follow from the properties of the expectation 
endomorphism. Property 8 follows directly from the commutivity of the shuffle 
product. 



4 Sinhlog and the optimal efficient integrator 

Recall our construction of a stochastic integrator for a given smooth function 
/: Diff(R^) — >• Diff(R^) we outlined in steps 1-3 in the introduction. In terms of 
the convolution shuffle algebra and corresponding endomorphism /*(id) G H, those 
steps and concepts therein, have natural concise translations. Proceeding through 
the construction, by direct analogy with tpt , a general stochastic integrator has the 
form 

id := /"^ o 7rs^„ o /*(id). 
The error associated with this approximation is 

7? := id - id. 

One integrator will be more accurate than another if the H-norm of its associated 
error ||-R||h is smaller than the other. 

Remark 10 In light of our comments in the introduction that the set of Lyndon 
word multiple integrals included in an integrator determine its order, we will re- 
strict ourselves to functions /* : EI — > H that are grade preserving. All the functions 
herein such as the logarithm, exponential, sinhlog, coshlog or any power series in J 
are grade preserving (their action on words generates linear combinations of words 
of the same length) . 

Remark 11 Also as mentioned in the introduction, we need to include the expecta- 
tions of the terms in the remainder in Sn+i in the integrator. These are in general 
encoded by iJ o vrg^^j o _R, so the effective leading order terms in the remainder are 
(id — E) o Tv^n+i ° R- We highlight this at the appropriate junctures. 
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Definition 6 (Pre-remainder) We define the pre-remainder Q to be the remain- 
der terms after applying the endomorphism /*(id) and then truncating to include 
the terms in S^„, i.e. it is 

Q:= r(id)-7rs^„o/*(id). 

The relationship between R and Q plays a key role in our subsequent analysis. 

We now examine the properties of integrators based on the sinhlog endomor- 
phism, indeed of a whole class of endomorphisms to which it belongs, as well as 
the coshlog endomorphism. First we focus on the sinhlog endomorphism and re- 
establish that it is an efficient integrator, in the context of the convolution shuffle 
algebra. As we do so, we will see natural questions that emerge, motivating us to 
dig deeper. 

Definition 7 (Efficient integrator) A numerical approximation to the solution 
of a stochastic differential equation is an efficient integrator if it generates a strong 

numerical integration scheme that is more accurate in the mean square sense than 
the corresponding stochastic Taylor integration scheme of the same strong order, 
independent of the governing vector fields and to all orders. In other words, if R 
denotes the remainder of an integrator, it is efficient if for § = for any n, and 
for any V: 

||(id-£;)oi?f ^ ||(id-s)oidf. 

The sinhlog endomorphism of interest has the expansion 

sinhlog* (id) = J - i J*2 + I J*3 - 1 J*4 + . . . . 

To construct a sinhlog integrator of strong order n/2, we start by applying the 
projection operator 7rs<„ to sinhlog* (id); call the result 

P := ■ns<n ° sinhlog*(id). 

Since is zero for any words w with |w| < n + 1 and each of the terms in 

the expansion above is grade preserving, we see that 

P= (j-ij*' + --- + i(-l)"+V*") o7rs^„. 

The pre-remainder is thus given by 

Q = sinhlog* (id) o tts^^^j. 

The compositional inverse of the sinhlog endomorphism is given by sinhlog"^ oX = 
X + h*{X, +v) where h*{X, := (i^ + x*2)*(V2) has the expansion 

h*{X, +u)=iy + iX*2 - ix*^ + ■■■ . 
By definition, the error R in the approximation id = sinhlog"^ o P is given by 

R = id — sinhlog"^ o P 

= sinhlog"^ o {P + Q) — sinhlog"^ o P 
= Q + h*iP + Q,+u)-h* (P, +u) 
= Q+i((P + Q)*2-P*2)+... 
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= Q + i(P*Q + Q*P) +0(Q*^). 
The leading order term in R is 

Q ° i"s„+i- 

We justify this as follows. Consider the term P-kQ. Since at leading order sinhlog'^(id) = 
J, we see at leading order 

P-kQ= (JOTTS^,.) ★ (JOTTS^^^J. 

This means that for some coefficients Cw G K, 

^((P ★ Q) o w) ® m = CujU'ig'u;. 

This follows from the fact that the two indicator functions annihilate any lower 
order terms in the two-part deconcatenation implied by the convolution, and that 
J annihilates the empty word. We also only retain the leading order terms in Q 
by applying vrg^^^ as shown in the final step above. 

Remark 12 We shall use the big O notation such as 0{P -kQ) or 0{Q*^) above to 
denote endomorphisms that only generate terms involving words that are higher 
order with respect to the grading g than those generated by the preceding endo- 
morphisms. 

Thus using Lemma [51^3) with § = S„+i we have 

llidf = IIQf + ||coshlog*(id)f . 

Now using id = sinhlog*(id) + coshlog*(id) and Lemma[2j5,7) we deduce that 

||(id-£)oidf = ||(id-_B)oQ||V ||(id-£)ocoshlog*(id)f . 

Recall from Remark [11] that we need to include in our integrators the expectation 
of the leading order terms in the remainder. This means the remainders for the 
stochastic Taylor expansion and sinhlog integrators are (id — E o id) o vrg^^^j and 
{Q — EoQ)oTvg^^^, respectively, since at leading order R = Qotts^+i - We have thus 
established that integrators based on the sinhlog endomorphism are at least as 
accurate as corresponding stochastic Taylor integrators at leading order, i.e. they 
are efficient. 

We can prove the following extension of Corollary 4.2 in Malham & Wiese 
(2009) to the alphabet A* = {0, 1, . . . ,d}; indeed assume this to be the alphabet 
hereafter. 

Lemma 3 With § = Sn+i, if n is odd, then the error of the sinhlog integrator is 
optimal; it realizes its smallest possible value compared to the error of the corresponding 
stochastic Taylor integrator. 
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Proof To prove this, order by order we perturb the sinhlog integrator as foUows. 
For a fixed n G N perturb the coeflicient of in the sinhlog expansion so 

that 

oo 

sinhlog:(id) = J + i + eJ*("+i), 

k=2 

where e £ M is a parameter. Then repeating the procedure above, we have 

Qe = (sinhlog*(id) + eJ*("+i)) o 7rs^„+,. 

Since the leading order behaviour of the inverse of sinhlog"^ is unaffected, we have 
at leading order Re = Qe. As above, truncate Qe with tvs„^i- Then with § = Sn+i 
we see 

llidf = (Q, + coshlog*(id) - eJ*("+i\Qe + coshlog*(id) - 

= \\Qef + 2(0,,coshlog*(id) - eJ*("+i') + ||coshlog*(id) - eJ*("+i)f 
= \\Qef + ||coshlog*(id)f - 2e(sinhlog*(id), J*("+i)) - e^jj J*("+i)f 
= \\Qef + ||coshlog*(id)f - e(id - S, J*("+i)) - e^ii j*(„+i)||2^ 

where in the penultimate step we used the orthogonality property of sinhlog and 
coshlog. When we include the expectations of the leading order terms in the re- 
mainder (cf. Remark I lip this relation becomes 

||(id-_B)oid||^ = ||(id-_B)o(3ef + ||(id- ocoshlog*(id)f 
- e((id - E) o (id - S), (id - E) o J*("+i)) 
-e2||(id-£)o J*("+l)f . 

The linear term in e is zero if n is odd, by Lemma [2j8). Hence in this case, the 
mean-square excess, the terms on the right other than ||(3c||^, is optimized when 
e = 0. 

Remark 13 This result can be found in Malham & Wiese (2009) for A = {1, ... , d). 
It extends to A = {0,1,..., d} using Lemma [21 though under the proviso that 
we grade according to word length. This means that we endeavour to include 
some multiple integrals in our stochastic integrator involving the drift '0' index 
that we would not ordinarily include if we were grading according to variance 
(of the multiple Wiener integrals). However we take the perspective here that 
there are far fewer of these terms at each grading according to length, and the 
computational effort associated with their simulation is a small fraction of that 
overall. If we include them, we guarantee efficiency. See Sj5] for an example. Of 
course if A = {1, . . . ,d} only, then the two notions of grading coincide and this 
technicality is redundant. 

Some natural questions now arise. It is apparent that the first key result in the 
argument above to prove efficiency was Lemma[2j4) showing that the norm-square 
of the identity endomorphism decomposes into the sum of the norm-squares of the 
sinhlog and coshlog endomorphisms. The second key result, to prove that the 
sinhlog integrator was optimal when n is odd, was Lemma[2|8). First, with regard 
to efficiency and Lemma [2]^4). Since there is no immediate apparent difference, it 
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would seem we could choose coshlog as our integrator endomorphism. Further, the 
result of Lemma [21^4) essentially relied on the fact that {S,S) = (id, id). However, 
we also know from Lemma [2j2) that {15*1, 15*1) = {id, id). Thus, in principle, we 
could also consider i(id — \S\) as an integrator (or indeed any pair ^(id±X) 
for which {X,X) = (id, id)). Note that ^(id— \S\) is the equivalent of applying 
the sinhlog endomorphism at even orders and the coshlog endomorphism at odd 
orders. Second, with regard to optimal efficiency, it would also seem that i(id-|S'j) 
would deliver optimality at both even and odd orders as the linear term in e above 
would be zero. However, this is not the case. The key is the relation between the 
error R and the pre-remainder Q. 

We define the following class of endomorphisms, set 

f*{X;e) :=i(X-rf*(-i)). 

for any e G M. Then we see /*(id, +1) is the sinhlog endomorphism and /*(id, —1) 
is the coshlog endomorphism. Note that the compositional inverse of f*{X; e) is 

f-\X;e)=X + h*{X,eu), 

where h* = h*{X,Y) is the convolutional square root of X*'^ + Y given in the 
introduction. Our main result is as follows. 

Theorem 1 For every e > the class of integrators /*(id; e) is efficient. When e = 1, 
the error of the integrator /*(id; 1) realizes its smallest possible value compared to the 
error of the corresponding stochastic Taylor integrator, i.e. if Re denotes the remainder 
of the integrator, we have that the mean-square excess 

||(id- S) oidf - ||(id- S) o7?ef 

is positive and maximized at e = 1. Thus a strong stochastic integrator based on the 
sinhlog endomorphism is optimally efficient within this class. 

Proof We prove the theorem in four steps. Note that we have for any e G R: 

r (id; e) = i(l - e)u + i(l + e).J - ^e{r' -J*' + ■■■). 

The integrator of interest of strong order n/2 is P := tts^^ o /*(id; e) and given by 

P = (l (1 - ,)^. + 1 (1 + e)J - -,r^ + ... + (-1)"J*")) o 

The pre-remainder is Q = /*(id;e) o tts^^^^. Further, the approximation id := 
/^^ o P results in an error given by 

i? = id - id 

= f"' °{p + Q)- .r' op 

= Q + h*{P + Q,eu)- h*{P, e u). 

Our goal now is to carefully examine this relationship between R and Q, and in 
particular, the difference on the right shown. 
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Step 1. A thorough understanding of the function h* is thus required, and 
we motivate our analysis by considering the corresponding real-valued function 
ft: -> M given by 

h: {x,y) (a;^ + j/)^/^. 

The function h = h{x,y) is real-analytic for y > —x^, zero on y = —x^ where 

its tangent plane is orthogonal to the (.t, j/)-plane, and complex-valued elsewhere. 
Hence we can choose a point inside the open region y > —x'^ about which the 
Taylor series for h = h{x, y) has a non-zero radius of convergence. In particular for 
y > —x^, the difference h{x + q,y) — h{x,y) at leading order in q is given by 

h{x + q,y)-h{x,y) = j-^^-^-q + --- . 

Step 2. By analogy with Step 1 we see in the convolution algebra we can expand 
R = Q + h*{P + Q,ev) - h*{P,ev) 

= Q + ((P + Qr + e .f^"^^ - (P*^ + e .f^"^^ 

★ (^(j. + (p*2 + ei.)*^-'' * (p* Q + g * p + g*'))*^'^'^ - 
= g + i (p^2 + e * ( (p * g + g * p) + o (g*^) ) . 

Further we note that at leading order 

P=i(l-e)i.+ l(l + e)J + 0(J*2), 
and thus for all e > —1 we have 

Substituting these expressions into that for R above, we get at leading order: 

i? = g + ^^g + o(j*g) 
= ^go7rs„,,. 

Step 3. Let us now compare the mean-square error of P to the corresponding 
stochastic Taylor integrator id o 7rs^„- We see that on S„+i we have 

||idf = (i? -I- id - P, i? + id - P) 

= \\Rf -\- 2{R, id - P) + (id - P, id - R) 

= \\Rf + 2(id - ^(id - eS), ^(id - 65)^ 

+ (id - 1^ (id - eS), id - j^(id - eS)\ 
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= \\Rf + llidf - ^^^(llidf - 2e(id, S) + e'WSf) 

= 11-^11'+ (TT^dlidf + (id,5)). 

If we include the expectations of the leading order terms in the integrator (again 
cf. Remark II ip then we have 

{id-E)oR = -|_(id-£)oQo7rs„^^. 
The comparison above becomes 
||(id-i;)oidf = \\{\d~E)oRf 

+ -p-^^ (||(id - E) o idf + ((id - E) o id, (id -E)oS 

Step 4- We see the mean-square excess is 

(l + g)2 (lKid - E) o idf + ((id - E) o id, (id -E)oS 
Note for any X, V G EI we have | (X, V) | ^ \\X\\ \\Y\\. Using Lemma[2]we see that 

||(id- £) o \S\\f = {\S\, \S\) - {Eo\S\,Eo\S\) 
= (id, id) - o id, £ o id) 
= ||(id-S)oidf . 

Hence we observe that ((id — _B) o id, (id — _E) o jSI) ^ || (id — _E) o id[|^, and thus 

< [((id-£)oid, (id-_B)oS')[ ^ || (id - _B) o id||^. 

Note all our statements above hold for any V. Hence for all e > 0, the mean-square 
excess is positive and thus P is a uniformly accurate integrator, thus establishing 
the first statement of the theorem. When e = the mean-square excess is zero as 
expected. For e < it is negative. Further we see that for e > the mean-square 
excess is largest when e = 1, corresponding to the sinhlog endomorphism. We have 
thus established the second statement of the theorem and the proof is complete. 

Note when e = — 1 the mean-square excess in Step 4 of the proof above is 
undefined. We can explain this as follows. The first equation in Step 2 becomes 

R = Q + i(P*^ - i.{P*Q + Q*P)+ 0{Q*^) . 

However now the leading order behaviour in P is P = -|- ^J*^ + C'(J*'^). If we 
denote P = P — v then we see that 

(P*2 _ = ((^ + p)*2_^^*(l/2) 

= ^/2P*(l/^)*(^+ip)'^(^/') 
= ^P*^^/^K(v+\P + 0{P* 
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Proceeding formally, we substitute these last two expansions into the expression 
for R above. Retaining leading order terms we find 

i? = Q + (J*^-^) o 7rs^„) *Q + 0(J*Q). 

In other words, the term in the error R corresponding to the difference of h* 
evaluated at P + Q and P generates the term (j*^"^-* o n^^^) * Q- The inverse of 
J is not an element of the group Q. It does have the formal expansion J**^~^^ = 
S + 5*^ + S*^ + • • • J but this does not have a finite evaluation on the empty 
word. Hence on §n+i, the term (j**-"^-* o 7rs<.„) * Q is not finite. However it is 
finite on — the inverse contracts the number of deconcatenations — but this now 
introduces terms in the remainder of the same order as those we retain in the 
integrator, i.e. we have a form of order reduction. 



5 Concluding remarks 

We established the sinhlog integrator is optimally efficient when we grade according 
to word length. For example, the sinhlog integrator specified by g(w) ^ 2 on the 
computation interval [tra,tn+i] is given by 

yn+i = sinhlog" ^(a-„,„+i) o 

where 

d d 

^n,n+l = '^^Ji{tn,tn+l)Vi + ^ hi^ij ~ Jji) {tn,tn+l)Vij . 
1=0 i,j=0 

Whatever the vector fields are, this is guaranteed to be more accurate than 

d d 
yn+l=yn + '^Jiitn,tn+l)Vi{yn)+ ^ Jij{tn,tn+l)Vij{yn), 

i=0 i,j=0 

which is the corresponding integrator based on the stochastic Taylor expansion 
according to the grading g(w) ^ 2. (A thorough numerical investigation confirming 
this conclusion in the diffusion-only case is presented in Malham & Wiese (2009). 
Further, Lord et al. (2008) demonstrate that high accuracy, higher order stochastic 
integrators can deliver greater accuracy for a given computational effort, than the 
Euler-Maruyama scheme, despite the cost associated with simulating the multiple 
Wiener integrals included.) First we note that at this order, is also the form 

of the exponential Lie series for g(w) ^ 2 (this is not true at any higher orders) 
which is not an efficient integrator. However at this order, the two integrators 
are distinguished when we compute their inverse endomorphisms to generate the 
corresponding approximations. Second, the inverse of the sinhlog endomorphism 
is sinhlog~^((7) = a + (id + cr^)^/^. If the vector fields are linear, an,n+i is a 
matrix and we can construct sinhlog"^ (cr) by computing the matrix square-root. 
If the vector fields are nonlinear we can expand the square root to sufficiently 
high degree terms (see Malham & Wiese 2009, Remark 5.3). Third, the stochastic 
Taylor based scheme above is a modification of the Milstein method where we 
additionally include terms involving Jjo, Joi and Joo for i = 1, . . . , d. 
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We considered here a comparison under the mean-square measure of the local 
errors. How the local errors accumulate to contribute to the global error was 
considered in Malham & Wiese (2009), where it was shown that the local error 
comparison transfers to the global error. 

If we know further structure in the stochastic differential system concerned, 
then the class of endomorphisms that are efHcient will widen. For example, if the 
diffusion vector fields commute with themselves, but not with the drift vector 
field, then we can deduce that the exponential Lie series is (trivially) an efficient 
integrator, again, under the proviso we grade according to word length. This is 
because in the remainder at leading order, the largest terms according to root- 
mean-square scaling with respect to stepsize, will involve words with non-zero 
letters. However these will not in fact be present as the diffusion vector fields 
commute. 

Our results have only relied on the symmetry properties of the expectation 
map (see Lemma and that it realizes positive values on words with a scaling 
according to word length. In particular they do not depend on the coefficients 
explicitly. Hence the sinhlog integrator is optimally efficient within the whole class 
of possible coefficients. In a slightly different direction, our result also holds for 
weak approximations of the multiple integrals, as long as the L^-norm of (sums 
of) integrals is preserved. We assume here, that the error is still being measured 
in the mean-square sense. 

Though in general the Eulerian idempotent is not efficient, it is a natural object 
in the construction of geometric integrators. Recently the Dynkin and Klyachko 
Lie algebra idempotents have proved to be important in the context of geometric 
integration, see Chapoton (2009), Patras & Reutenauer (2002), Munthe-Kaas & 
Wright (2007), and Lundervold & Munthe-Kaas (2011a,b). Interesting questions 
here besides the extension of their use to stochastic differential equations is which 
of these projectors generates a more accurate and efficient geometric integrator 
than the other? 
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